task variant
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1
McCoy, R. Thomas, Yao, Shunyu, Friedman, Dan, Hardy, Mathew D., Griffiths, Thomas L.
In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 -- like previous LLMs -- is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.
Continual Reinforcement Learning with TELLA
Fendley, Neil, Costello, Cash, Nguyen, Eric, Perrotta, Gino, Lowman, Corey
Training reinforcement learning agents that continually learn across multiple environments is a challenging problem. This is made more difficult by a lack of reproducible experiments and standard metrics for comparing different continual learning approaches. Researchers can define and share their own curricula over various learning environments or run against a curriculum created under the DARPA Lifelong Learning Machines (L2M) Program. In the last decade, reinforcement learning (RL) with deep neural networks has been successfully applied in a wide variety of domains (Arulkumaran et al., 2017). In typical RL scenarios, the RL agent learns a single task, defined as a single Partially Observable Markov Decision Process (POMDP).
- Education (0.74)
- Government > Regional Government > North America Government > United States Government (0.68)
Allocation of Multi-Robot Tasks with Task Variants
Task allocation has been a well studied problem. In most prior problem formulations, it is assumed that each task is associated with a unique set of resource requirements. In the scope of multi-robot task allocation problem, these requirements can be satisfied by a coalition of robots. In this paper, we introduce a more general formulation of multi-robot task allocation problem that allows more than one option for specifying the set of task requirements--satisfying any one of the options will satisfy the task. We referred to this new problem as the multi-robot task allocation problem with task variants. First, we theoretically show that this extension fortunately does not impact the complexity class, which is still NP-complete. For solution methods, we adapt two previous greedy methods for the task allocation problem without task variants to solve this new problem and analyze their effectiveness. In particular, we "flatten" the new problem to the problem without task variants, modify the previous methods to solve the flattened problem, and prove that the bounds still hold. Finally, we thoroughly evaluate these two methods along with a random baseline to demonstrate their efficacy for the new problem.
Perturbation Training for Human-Robot Teams
Ramakrishnan, Ramya, Zhang, Chongjie, Shah, Julie
In this work, we design and evaluate a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks that require coordination. The joint strategies are learned through "perturbation training," a human team-training strategy that requires team members to practice variations of a given task to help their team generalize to new variants of that task. We formally define the problem of human-robot perturbation training and develop and evaluate the first end-to-end framework for such training, which incorporates a multi-agent transfer learning algorithm, human-robot co-learning framework and communication protocol. Our transfer learning algorithm, Adaptive Perturbation Training (AdaPT), is a hybrid of transfer and reinforcement learning techniques that learns quickly and robustly for new task variants. We empirically validate the benefits of AdaPT through comparison to other hybrid reinforcement and transfer learning techniques aimed at transferring knowledge from multiple source tasks to a single target task. We also demonstrate that AdaPT's rapid learning supports live interaction between a person and a robot, during which the human-robot team trains to achieve a high level of performance for new task variants. We augment AdaPT with a co-learning framework and a computational bi-directional communication protocol so that the robot can co-train with a person during live interaction. Results from large-scale human subject experiments (n=48) indicate that AdaPT enables an agent to learn in a manner compatible with a human's own learning process, and that a robot undergoing perturbation training with a human results in a high level of team performance. Finally, we demonstrate that human-robot training using AdaPT in a simulation environment produces effective performance for a team incorporating an embodied robot partner.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material (0.68)